RANDOM MATRICES: LAW OF THE DETERMINANT 
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Abstract. Let A„ be an n by n random matrix whose entries are independent real 
random variables satysfying some natural conditions. We show that the logarithm of 
I det A„ I satisfies a central limit theorem. More precisely, 




1. Introduction 



Let An be an n by n random matrix whose entries Ojj, 1 < i,j < n, are independent real 
random variables of zero mean and unit variance. We will refer to the entries Uij as the 
atom variables. 

As determinant is one of the most fundamental matrix functions, it is a basic problem in 
the theory of random matrices to study the distribution of det An and indeed this study 
has a long and rich history. The earliest paper we find on the subject is a paper of Szekeres 
and Turan from 1937, in which they studied an extremal problem. In the 1950s, there 
is a series of papers jSl [151 EH E] devoted to the computation of moments of fixed orders 
of det An (see also [S] ) . The explicit formula for higher moments get very complicated 
and in general not available, except in the case when the atom variables have some special 
distribution (see, for instance 

One can use the estimate for the moments and Markov inequality to obtain an upper bound 
on |det^„|. However, no lower bound was known for a long time. In particular, Erdos 
asked whether det An is non-zero with probability tending to one. In 1967, Komlos \12\ [T3] 
addressed this question, proving that almost surely |det^„| > for random Bernoulli 
matrices (where the atom variables are iid Bernoulli, taking values ±1 with probability 
1/2). His method also works for much more general models. Following [12], the upper 
bound on the probability that det^„ = has been improved in [TTl [201 IHl [3]- However, 
these results do not say much about the value of | det An \ itself. 

In a recent paper [20] , Tao and the second author proved that for Bernoulli random matrices, 
with probability tending to one (as n tends to infinity) 
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for any function cj(n) tending to infinity with n. This shows that almost surely, log | det^dnl 
is (2 +o(l))nlogn, but does not provide any distributional information. For related works 
concerning other models of random matrices, we refer to |17j . 

In [S], Goodman considered random gaussian matrices where the atom variables are iid 
standard gaussian variables. He noticed that in this case the determinant is a product 
of independent Chi-square variables. Therefore, its logarithm is the sum of independent 
variables and thus one expects a central limit theorem to hold. In fact, using properties of 
Chi square distribution, it is not very hard to prove 



log(|detA„|) - ilogfn- 1)! , , 



^ logn 

In [6] , Girko stated that ([2]) holds for general random matrices under the additional assump- 
tion that the fourth moment of the atom variables is 3. Twenty years later, he claimed a 
much stronger result which replaced the above assumption by the assumption that the atom 
variables have bounded {4: + 6)-th moment However, there are points which are not clear 
in these papers and we have not found any researcher who can explain the whole proof to 
us. In our own attempt, we could not pass the proof of Theorem 2 in [7]. In particular, 
definition (3.7) of this paper requires the matrix to be invertible, but this assumption 
can easily fail. 

In this paper, we provide a transparent proof for the central limit theorem of the log- 
determinant. The next question to consider, naturally, is the rate of convergence. We are 
able to obtain a rate which we believe to be near optimal. 

We say that a random variable ^ satisfies condition CO (with positive constants Ci,C2) if 

P(|ei>t)<Ciexp(-t^2) (3) 

for ah t > 0. 

Theorem 1.1 (Main theorem). Assume that all atom variables aij satisfy condition CO 
with some positive constants Ci,C2- Then 



sup 



p^log(|det^„|)-jlog(n-l)! ^ _ ^^^^| ^ j^g_v3+o(i) (4) 
h logn 



Here and later, <I>(x) = P(N(0,1) < x) = exp(— t^/2)(it. In the remaining part of 

the paper, we will actually prove the following equivalent form, 



sup 



p log(det^J-log( n-2)! ^ . _ ^ i,g-i/3+o(i) (5) 

V 2 log n 
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To give some feeling about ([5]), let us consider the case when are iid standard gaussian. 
For < i < n — 1, let T/j be the subspace generated by the first i rows of An- Let Aj^i 
denote the distance from a^+i to Vi, where aj+i = (aj+i^i, . . . , aj+i_„) is the (i + l)-th row 
vector of An- Then, by the "base times height" formula, we have 



n-1 

detAl = llAl,. (6) 

i=0 



Therefore, 



n-1 

logdetA^ = J^logA^^i. (7) 

1=0 



As the Qij are iid standard gaussian, A^^^ are independent Chi-square random variables of 
degree n — i. Thus, the right hand side of ([T]) is a sum of independent random variables. 
Notice that A?_|_j^ has mean n — i and variance 0{n — i) and is very strongly concentrated. 
Thus, with high probability log A?_^]^ is roughly log((n — i) + 0{\/n — i)) and so it is easy 
to show that log A?_,_^ has mean close to log(n — i) and variance 0(^^). So the variance of 

Y^7=o^°S^i+i is O(logn). To get the precise value -v/21ogn one needs to carry out some 
careful (but rather routine) calculation, which we leave as an exercise. 

The reason for which we think that the rate log"^^'^^"^^-' n might be near optimal is that 
(as the reader will see though the proofs) 2 log n is only an asymptotic value of the vari- 
ance of log|det A„|. This approximation has error term of order at least ri(l) and since 
-^21ogn + fi(l) — -v/2 logn = r2(log~^/^ n), it seems that one cannot have rate of conver- 
gence better than log^^^^^°^^^ n. It is a quite interesting question whether one can obtain 
a polynomial rate by replacing log(n — 1)! and 2 logn by other, relatively simple, functions 
of n. 

Our arguments rely on recent developments in random matrix theory and look quite different 
from those in Girko's papers. In particular, we benefit from the arguments developed in 
[201 [22l [23] . We also use Talagrand's famous concentration inequality frequently to obtain 
most of the large deviation results needed in this paper. 

Notation. We say that an event E holds almost surely if P(-E^) tends to one as n tends 
to infinity. For an event A, we use the subscript Fy_{A) to emphasize that the probability 
under consideration is taking according to the random vector x. For 1 < s < n, we denote 
by eg the unit vector (0, . . . , 0, 1, 0, . . . , 0), where all but the s-th component are zero. All 
standard asymptotic notation such as 0,0,o, ... etc are used under the assumption that 
n — )• oo. 
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Figure 1. The plot compares the distributions of (log(det^^) — log(n — 
1)!) / \/21ogn for random Bernoulh matrices, random Gaussian matrices, and 
N(0, 1). We sampled 1000 matrices of size 1000 by 1000 for each ensemble. 



2. Our approach and main lemmas 



We first make two extra assumptions about An- We assume that the entries ajj are bounded 
in absolute value by log^ n for some constant /? > and An has full rank with probability 
one. We will prove Theorem |1.1| under these two extra assumptions. In Appendix [Aj we 
will explain why we can implement these assumptions without violating the generality of 
Theorem II. 1[ 

Theorem 2.1 (Main theorem with extra assumptions). Assume that all atom variables 
Qij satisfy condition CO and are bounded in absolute value by log'' n for some constant (3. 
Assume furthermore that An has full rank with probability one. Then 



sup 



,log(|det^„|) - ilog(n- 1) 



log n 



< x) - 



< log 



-l/3+o(l) 



n. 



(8) 



In the first, and main, step of the proof, we prove the claim of Theorem 2.1 but with the 
last log" n rows being replaced by gaussian rows (for some properly chosen constant a). We 
remark that the replacement trick was also used in [7J, but for an entire different reason. 
Our reason here is that for the last few rows. Lemma |2.4| is not very effective. 

Theorem 2.2. For any constant P > 1 the following holds for any sufficiently large constant 
a > 0. Let An be ann by n matrix whose entries aij,i < no, 1 < j < n, are independent real 
random variables of zero mean, unit variance and absolute values at most log'^n. Assume 
furthermore that An has full rank with probability one and the components of the last log" n 
rows of A are independent standard gaussian random variables. Then 



sup 



log(det^^) - log(n - 1)! 
V2 log n 



<x)- <I>(x) 



<log 



-l/3+a(l) 



n. 



(9) 
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In the second (and simpler) step of the proof, we carry out a replacement procedure, replac- 
ing the gaussian rows by the original rows once at a time and show that the replacement does 
not effect the central limit theorem. This step is motivated by the Lindeberg replacement 
method used in 



We present the verification of Theorem 2.1 using Theorem 2.2 in Sectional In the rest of 



this section, we focus on the proof of Theorem 2.2 



Notice that in the setting of this theorem, the variables Aj are no longer independent. 
However, with some work, we can make the RHS of ^ into a sum of martingale differences 
plus a negligible error, which lays ground for an application of a central limit theorem of 
martingales. (In [7], Girko also used the CLT for martingales via the base times height 
formula, but his analysis looks very different from ours.) We are going to use the following 
theorem, due to Machkouri and Ouchti 1141 . 



Theorem 2.3 (Central limit theorem for martingales). |14L Theorem 1] There exists an ab- 
solute constant L such that the following holds. Assume that Xi, . . . , Xm are martingale dif- 
ferences with respect to the nested a-algehra £o,£i, ■ ■ ■ ,£m-i- Let v"^ := XIS)^ '^{Xfj^i\£i) , 
and := E(X?). Assume that E(|X?^^||iSj) < '^iEi{Xfj^^\£i) with probability one for 

all i, where (7j)™ is a sequence of positive real numbers. Then we have 



sup 

xeR 



P( 



E 



0<i<m -^i+l 



< x) 



<I>(x) 



< L 



/ max{7o, . . . ,7n^_i}logm ^^/g 



min{s^,2"^} 



To make use of this theorem, we need some preparation. Condition on the first i rows 
ai, . . . , aj, we can view Aj+i as the distance from a random vector to Vi := Span (ai, . . . , aj). 
Since An has full rank with probability one, dimT^ = i with probability one for all i. The 
following is a direct corollary of [24!, Lemma 43]. 

Lemma 2.4. For any constant /3 > there is a constant C3 > depending on /3 such that 
the following holds. Assume that V C R" is a subspace of dimension dim{V) < n — 4. Let 
a be a random vector whose components are independent variables of zero mean and unit 
variance and absolute values at most log^n. Denote by A the distance from a to V. Then 
we have 



E(A^) =n- dim{V) = n-i 

and for any t > 



t" 



P(|A - V^-dim(y)| >t) = 0(exp(--^^)). 

log n 



Set 



uq := n — log" n, 
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where a is a sufficiently large constant (which may depend on /3). We will use short hand 
ki to denote n — i, the co-dimension of Vi (and the expected value of A^). 

We next consider each term of the right hand side of ^ where i < uq. Using Taylor 
expansion, we write 



log^=log(l+''^+^ 



1 a2 _ h 

«\2 



A^ii ki 1 A^ii ki 

^ — y + Ri+i 



7, 1" -Rj+l; 



where 



- k- X"^ 

Xi+i := ^, and iJ^+i := log(l + - (X^+i - ^). 

Ki 2 

By Lemma |2.4| and by choosing a sufficiently large, we have with probability at least 
1 — 0(exp(— log^ n)) (the probability here is with respect to the random (i + l)-th row, 
fixing the first % rows arbitrarily) 

= Oikfl'') = 0((n - = 0(1). (10) 

Thus, with probability at least 1 — 0(exp(— log^ n)) 



|i?,+i| = 0(|X,+i|3) = 0((n-i)-9/8). 

Hence, by Bayes' formula, the following holds with probability at least (1— 0(exp(— log^ n)))^ 
1 - 0(exp(-log2 n/2)) 



R^+l = OiY, in - 0"'/') = o(log-' n), 

i<no i<no 

again by having a sufficiently large. 
We conclude that 

Lemma 2.5. With probability at least 1 — 0(exp(— log^ n/2)) 

Ej<no Rj+i ^ log~^ n ^ 
\J2 log n \J2 log n 



We will need three other lemmas 
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Lemma 2.6 (Main contribution). 

sup P( 

^eR V21ogn 

Lemma 2.7 (Quadratic terms). 



< x) -P(N(0,1) < 



> log-V3+o(l) \ < log-l/3+o(l) 



\/21ogn 

Lemma 2.8 (Last few rows). For any constant 0<c< 1/100 



n. 



a2 

V log- 



\J2 log n 



>log-^/2+^n) = o(exp(-log'=/2n)). 



Theorem 2.2 follows from the above four lemmas and the following trivial fact (used repeat- 
edly and with proper scaling) 



P(yl + 5 < ax) < V{A < a{x - e)) + P{B < ae). 



The reader is invited to fill in the simple details. 



We will prove Lemma 2.6 using Theorem 2.3 Lemma 2.7 will be verified by the moment 



method and Lemma 2.8 by elementary properties of Chi-square variables. The key to the 
proof of Lemmas |2 . 6| and |2 . 7| is an estimate on the entries of the projection matrix onto the 
space Fj"*" , presented in Section [Zj 



3. Proof of Lemma [2T6] and Lemma [2771 Opening 

Denote by Pi = {pst{i))s,t the projection matrix onto the orthogonal complement V^-*". A 
standard fact in linear algebra is 



tr(Pi) = '^pssii) = h. 



(11) 



Also, as Pi is a projection, Pf = Pi. Comparing the traces we obtain 



s,t 



We now express Xj+i using Pi, 



(12) 



WPiSii+if-ki Y.s,tPst{i)asat-ki 
Xi+i = , = , := 2^ qst(t)aaat - 1, 



ki 



ki 
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where oi = ai+i,i, . . . , a„ = ai+i^„ are the coordinates of the vector aj+i and 



Pst{i) 



By (11) and (12) we have X]s9ss(^) = 1 and Yjst^st{i 



|2 = 



Because Ea^ = and Ea^ = 1, and the are mutuahy independent, we can show by using 
a routine calculation that (see Section [S]) 



E(x2^i|£:,) = --^g,,(i)2(3-Ec 



(13) 



where £i is the cr-algebra generated by the first i rows of An. 
Define 



and 



^i+i •= ^ X] qss{if{^ - Eos 



The reason we split _)_ ^ into the sum of l^+i and is that 'Ei{YiJ^i\£i) = and 

its variance can be easily computed. 

Lemma 3.1. 

p/, Ei <no^m ^ > log-l/3+°(l) ) < log-l/3+°(l) 

a/2 log n / 



n. 



To complete the proof of Lemma 2.7, we show that the sum of the Zi is negligible 



%ii=n(!flS))=o(,r™), (14) 

yzlogn yliogn / 



Our main technical tool will be the following lemma. 
Lemma 3.2. With probability 1 — 0{n''^^^) we have 

"^qssiif = O(loglogn). 



«<no 
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Notice that Eaf is uniformly bounded (by condition CO), it follows that with probability 
1 - O(n-ioo), 



i<no s 



g,,(i)2|3-Eaf| = 0(loglogn) 



proving (14) 



4. Proof of Lemma [276] and Lemma [2771 Mid game 



The key idea for proving Lemma 3.2 is to establish a good upper bound for • For this, 



we need some new tools. Our main ingredient is the following delocalization result, which 
is a variant of a result from [22j, asserting that with high probability all unit vectors in 
the orthogonal complement of a random subspace with high dimension have small infinity 



norm. 



Lemma 4.1. For any constant /3 > the following holds for all sufficiently large constant 
a > 0. Assume that the components o/ai, . . . ,a„j, where ni := n — n log"^"^ n, are indepen- 
dent random variables of mean zero, variance one and bounded in absolute value by log^ n. 
Then with probability 1 — 0{n~^^^), the following holds for all unit vectors v of the space 



V 



ni 



|v||oo = 0(log-^"n) 



Proof, (of Lemma 3.2 assuming Lemma 4.1) Write 



i<ni s ni<i<no s 

:= Si + 52. 



Note that 



E'^^^«^<E^-«^ = E^--- — 



s,t 



kf ki (n - i) 

s,t >■ ^ ' 



Hence, 



^1 ^ E E^-(^)' ^ E = O(loglogn) 

i<ni s i<ni 



To bound 5*2, note that 
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Pss{i) = efPje^ = \\Piesf = [(e^, v)|2 
for some unit vector v G V^^. 

Thus if i > ni , then Vj^ C , and hence by Lemma 



4.1 



Pss{i)<\W\\lo = 0{log-'"n). (15) 

It follows that 



S2< ^ maxpssji) ^''''^^^ 



in — i)' 

ni<i<no s 



O(log-^-n) = ^Oog"""^' 

^-^ (n — i) 

ni<i<no 



n] 



completing the proof of Lemma 3.2 □ 



We now focus on the infinity norm of v and follow an argument from 



Proof, (of Lemma 4.1 ) By the union bound, it suffices to show that \vi\ = 0(log n) with 
probability at least 1 — ©(n^^*^^), where vi is the first coordinate of v. 

Let B be the matrix formed by the first ni rows ai, . . . ,a„^ of A. Assume that v G 
then 



5v = 0. 

Let w be the first column of and B' be the matrix obtained by deleting w from B. 
Clearly, 



uiw = —B'V, (16) 
where v' is the vector obtained from v by deleting vi. 

We next invoke the following result, which is a variant of [221 Lemma 4.1]. This lemma was 
proved using a method of Guionet and Zeitouni [T^, based on Talagrand's inequality. 

Lemma 4.2 (Concentration of singular values). For any constant /3 > the following holds 
for all sufficiently large constant a > 0. Let be a random matrix of size n by n, where 
the entries Oij are independent random variables of mean zero, variance one and bounded 
in absolute value by \og^ n. Then for any n/log"n < k < n/2, there exist 2k singular 
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values of M in the interval [0, ck/ y/n], for some absolute constant c, with probability at 
leastl-0{n-^^^). 



We can prove Lemma 4.2 by following the arguments in |22l Lemma 4.1] almost word by 
word. 



By the interlacing law and Lemma 4.2, we conclude that B' has n — ni singular values in 
the interval [0, c(n — nl)/^/n] with probability 1 — 0{n~^^^). 

Let H be the space spanned by the left singular vectors of these singular values, and let vr 
be the orthogonal projection on to H. By definition, the spectral norm of irB' is bounded, 



K-B'II < c(n — n\)l\pn 



Thus ( 16 ) implies that 



|7;i|||7rw|| < c(n — (17) 

On the other hand, since the dimension of -?/ is n — ni. Lemma |2.4| implies that ||vrw|| > 
\Jn — ni/2 with probability 1 — 4exp(— (n — ni)/16) = 1 — 0{n~^^). 



It thus follows from (17) that 



\vi\ = 0(log"^"n 



□ 



5. Proof of Lemma [2T6j End game 

Recall from Section [2] that conditioned on any first i rows, \Xi\ = 0{k- '^^^) with probability 
1 - 0(exp(-log2n/2)). So, by paying an extra term of 0(exp(— log^ n/2)) in probability. 



it suffices to justify Lemma 2.6 for the sequence X'- := • I i^/.-s/s.. 



On the other hand, the sequence X'-j^^ is not a martingale difference sequence, so we slightly 
modify X[j^^ to X^_^^ := X'-_^_^ — E{X'-_^^-^\£i) and prove the claim for the sequence X'-_^_^. In 
order to show that this modification has no effect whatsoever, we first demonstrate that 
^iXl_^_l\£i) is extremely small. 

Recall that X^+i = Ylist^^ti^)^s0.t ~ ^- Cauchy-Schwarz inequality and the assumption 
that Qs are bounded in absolute value by log*^*-^^ n, we have with probability one 
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2 2\ 

a, at) 



s,t 



<2(1 + E«k') <^'logO^'^ 



n. 



Thus with probabihty one 



(18) 



\E{Xl^,\£,)\ = \E{Xl^,\£i) - E(X,+i|£:,)| < exp(-(- - o{l))\og'n). (19) 



To justify Lemma 2.6 for the sequence X'^'^^, we apply Theorem 



2.3 



The key point here is that thanks to the indicator function in the definition of X'-_^_^ and the 

fact that the difference between X-^-^ and is neghgible, X'-'_^_^ is bounded by 0(/c, 

with probabihty one, so the conditions B{\Xl'^^\^\£i) < 7iE(X,'^^_i ^i) in Theorem 
satisfied with 



2.3 



are 



7i = 0(A:r3/8) = 0(log-^"/«n). 



3a/8 , 



We need to estimate with respect to the sequence X'-'^^. However, thanks to the 

observations above Xj+i and X'-^-^ are very close, and so it suffices to compute these values 
with respect to the sequence Xj+i. 



Recall from (13) that 



BiXl,\£,) = --^qU^)^3-Bas'). 

s 

Also, recall from Section |4] that with probability 1 — 0{n~^^^), 

E E«-(^)'(3 - Eaf) = O(loglogn). 

i<no s 



This bound, together with (13) and (18), imply that with probability one 



E(E X^il^i) = E TT + 0(loglogn) = 21ogn + 0(loglog 



n] 



i<no 



i<no 



which in turn implies that v'^ = 21ogn + O (log log n) with probability 1 — n 



"100 



Using (18) again, because n ^^^'n? log*^*-^^ n = o(l), we deduce that 
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■sL = 2 log n + 0(log log n) 



(20) 



With another application of (18), we obtain 



El 



-'no 



no 



i| < 0{ 



log log n ^ 
log n ' 



+ n-i0Vlog«(i)n. 



It follows that 



El/3|^_1| <log-l/3+o(l) 
•^no 



n. 



By the conclusion of Theorem 2.3 and setting a sufficiently large, we conclude 



sup 



'no 



< L 



log 3"/^n X log no ^ ^1/3 < 

q2 



'no 



no 



< log-l/3+o(l) 



n, 



completing the proof of Lemma 2.6 



6. Proof of Lemma 12. 7t End Game 



Our goal is to justify Lemma 3.1, which together with (14) verify Lemma 2.7 



We will show that the variance Var(^^^^^ ^+1) is small and then use Chebyshev's inequal- 
ity. The proof is based on a series of routine, but somewhat tedious calculations. We first 
show that the expectations of the i^+i's are zero, and so are the covariances E(l^_|_iy^_|_i) 
by an elementary manipulation. The variances Var(l^-(_i) will be bounded from above by 
Cauchy-Schwarz inequality. 

We start with the formula Xfj^^ = ^ qst{i)asat)'^ — 2 Yls t Qst{'i)asat + 1- Observe that 



qst{i)asatf = qss{i)al + qstii^asat) 

s 



s,t S s^t 

1ss{i)as'^f + qst{i)asatf + 2(^ qss{i)al)C^ qst{i)asat). 



Expand each term, using the fact that qssii) = 1 and Yls t Qstii)"^ ~ F' have 
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s S S s^t 

= 1 - Egss(i)'(l - a^) + 2Efe(i)(ZttW(a^a? - 1), 



and 



{Sl,tl}j^{s2,t2} 

2 

= ^ - 2E^«'^(^)^ + 2E^«*(^)^("^a? - 1) + 2 E QsitAi)qs2t2{i)asiatias2at2), 

{si,ti}^{s2,t2} 

as well as 



It follows that 



2 

= CYqst{i)asatf - 1 - 2E9ss(^)(as " 1) - 2 E " + E ^^'^^^^^^"^ ~ Ea/) 
= -2 E - 1) + E Issiifiat - Eaj) + 2 E " 1) + 2 E 

s S s^t s^t 

+ 2 E QsitA'i)Qs2t2{i)asiat^as2at2 +2{J2Qss{i){al - l)){J2^st{i)asat). (21) 

{Sl,tl}¥={^2,t2} 

As Ea^ = 0, Ea^ = 1, and the a^'s are mutually independent with each other and with 
every row of index at most i (and in particular with gst(i)'s), every term in the last formula 



is zero, and so we infer that E{Yi^i) = and E(l^+i|iSj) = , confirming (13). With the 



same reasoning, we can also infer that the covariance E(li-|_iYj+i) = for all j < i. 



It is thus enough to work with the diagonal terms Var(li+i). We have 
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Var(l-+i) = E [ - X qUi){a's - 1) + ^ E ^-(^)'(«' - 

s s 

+ Yl Qss{i)qtt{i){ayt - 1) + ^ 9.t(i)'(a'a? - 1) 

SlT^tl,S2T^t2 S 
{Sl,ti}^{s2,t2} 

After a series of cancellation, and because of Condition CO, we have 



Var(y,+i) < O (e [ X 'issiif + X ^-(^)' + E (i) Qtitiii)qt2t2ii) 

S^t\,S^t2 Sl^tl, 52^*2 Si* 

s s,t s,t s,t 

+ E (iss{ifqtt{i) + E (issiifqstiif + E kss(«)^9«(09st(OI 

+ Yqss{i)qtt{i)qst{if + E lfe(0^««(09st(i)l 

+ Y\qst{ifqss{i)\ 

s,t 

+ E \qss{i)qst^{i)qst2{i)qtit2{i)\ ), 

where the first two rows consist of the squares of the terms appearing in Yij^i (after delet- 
ing several sums of zero expected value), and each of the following rows was obtained by 
expanding the product of each term with the rest in the order of their appearance. 

Because "^stl^tii)'^ = one has max^.^ If^sttOI — for all s,t. Recall furthermore that 

X^s<Zss(0 — 1 ^^"^ — qss{i) for all s. We next estimate the terms under consideration one 
by one as follows. 

Firstly, the sums Yls 9L(0> J2s J2s,t qss{i)qst{if , E^,* qss{ifqst{if , J2s,t qss{i)qtt{i)qst{if , 

^-iid \qst{'i'fqss{j')\ can be bounded by max^^^ \qst{i)\ Yl,s,t qlS)-: ^"^^ so by kj^^"^ ■ 

Secondly, by applying Cauchy-Schwarz inequality if needed, one can bound the sums 

Es,ii,t2 qstS?qst2{i?. ^sr,tus2,t2 \lsiti{i)qsit2ii)qs2ti(J')qs2t2ii)\, and Es,ti,t2 \qssii)qsti{i)qst2(i)qtit2{i)\ 

by2(E.,t4W)',and so by 2fcr2. 

We bound the remaining terms as follows. 



- ^))CYqst{i)asat) 
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• Es,t qss{ifqtt{i) + Es.t qss{ifqtt{i) < 2(E. qss{if)iEt Qttii)) = 2 qssi^' 



'Es,t\lss{i)qtt{i)qst{i)\ < 'Es,t Qss{i){qtt{i) +qst{i) ) < +maxs Xl^,* 



• Y.s,t\lss{'i')'^qtt{i)qst{i)\ < snp,^t\qst{i)\T,s,t'iss{i)^qtt{i) < Esfe(«)V\/^ 
Putting all bounds together we have 



Var( = Var(y,+i) = o(e( J] ^^^(i)^ + J] A:, 

i<no i<no i<no s i<no 

= O (log log n). 



-3/2s 



(22) 



where we applied Lemma 3.2 in the last estimate 



To complete the proof, we note from the estimate of s^^ of Section [s] and from Lemma 
that I Et<no ~ O(loglogn). Thus by Chebyshev's inequality 



3.2 



/| E. <no^m ^ ^ iQg-l/3+o(l) \ ^ log-V3+o(l) 

V -i/21ogn / 



^21^ 



n. 



7. Proof of Lemma 12.81 



Let us first consider the lower tail; it suffices to show 



a2 

P( V < -log-Vs+'^n) = o(exp(-log'=/2n)) 

v/21ogn 



for any constant < c < 1/100. 



(23) 



By properties of the normal distribution, it is easy to show that and A^_]^ are at least 

exp(— ^ log'^ n) with probability 1 — exp(— J7(log'^ n)), so we can omit these terms from the 
sum. It now suffices to show that 



A2 

P( E ^tI^ < -^og-'/'+^n) = o(exp(-log^/2^)) (24) 
no<t^-3 ^/21og^ 2 



for any small constant < c < 1/100. 
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Flipping the inequality inside the probability (by changing the sign of the RHS and swapping 
the denominators and numerators in the logarithms of the LHS) and using the Laplace 
transform trick (based on the fact that the A? are independent), we see that the probability 
in question is at most 

■pi "TT"— 3 n—i TT™""^ pi n—i 



exp(^ log" n) exp(^ log'' n) 

Recall that A?, , is a Chi-square random variable with degree of freedom n — i, so E^r^ — 
_ . Therefore, the numerator in the previous formula is (" "o)(n no i) ^ ^^^2a ^_ 

Because 



;°f""e =o(exp(-log-/^n)), 
exp(^log n) 

the desired bound follows. 

The proof for the upper tail is similar (in fact simpler as we do not need to treat the first 
two terms separately) and we omit the details. 

8. Deduction of Theorem 12.11 from Theorem 12.21 

Our plan is to replace one by one the last n—UQ gaussian rows of An by vectors of components 
having zero mean, unit variance, and satisfying Condition CO. Our key tool here is the 
classical Berry-Eseen inequality. In order to apply this lemma, we will make a crucial use 
of Lemma 14. 1[ 



Lemma 8.1. [H Berry-Esseen inequality] Assume that v = (vi, . . . ,Vn) is a unit vector. 
Assume that 6i, . . . ,6„ are independent random variables of mean zero, variance one and 
satisfying condition CO. Then we have 



sup \P{vihi + ■■■ + Vnhn < x) - P(N(0, 1) < X)| < c||v||oo, 

X 

where c is an absolute constant depending on the parameters appearing in 

We remark that in the original setting of Berry and Esseen, it suffices to assume finite third 
moment. 

In application, v plays the role of the normal vector of the hyperplane spanned by the 
remaining n — 1 rows of A, and A„ = |f i6i + • • • + Vnbn\, where (6i, . . . , 6„) = b is the vector 
to be replaced. 



For the deduction, it is enough to show the following. 



18 



HOI H. NGUYEN AND VAN H. VU 



Lemma 8.2. Let An be a random matrix with atom variables satisfying condition CO and 
non-singular with probability one. Assume furthermore that An has at least one and at most 
log" n gaussian rows. Let Bn be the random matrix obtained from An by replacing a gaussian 
row vector a of An by a random vector b = . . . , bn) whose coordinates are independent 
atom variables satisfying condition CO such that the resulting matrix is non-singular with 
probability one. Then 



sup 

X 



, log(det^^)-log(n-l)! ^ \ ^ log(det V) - log(n - 1)! ^ 



\/2 log n 



\/2 log n 



< 0(log-2°n 



(25) 



Clearly, Theorem 1 1 . 1 1 follows from Theorem 2.2 by applying Lemma 8.2 log" n times. 



Proof. (Proof of Lemma 8.2) Without loss of generality, we can assume that Bn is obtained 



from An by replacing the last row a„. As An is non-singular, dim(T4i-i) = n — 1. 



By Lemma 4.1 , by paying an extra term of 0{n~^^^) in probability (which will be absorbed 
by the eventual bound log~^"ri), we may also assume that the normal vector v of Vn-i 
satisfies 



|v||oo = 0(log-2"n) 



Next, observe that 



log(det^2) _ i„g(^ _ _ ^^ro2log(A2^i/(n - i)) + logn log 



n 



+ 



n 



n 



and 



log(detS2)-log(n-l)! _ Er=o'log(^'+i/("-^)) + log^ , logA: 



n 



^/21^ 



+ 



/ 2 

n 



n 



n 



where A„ and A^ are the distance from a„ and b„ to Vn-i respectively. 



By Lemma 8.1, it is yielded that 



sup |Pa„(A^ < x) - PbJA^' < x)\ < c||v||oo = 0(log--^"n). 



2a , 



Hence 
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,' log(det^^) -log(ra- 1)! ^ \_ ^ log(det - log(n - 1)! ^ ^ 

sup |Jr'a„ I ^^^^1 SXj -T b„ I SX 



\J1 log n ' "V ^J2 log n 

= 0(log-2" n), 



completing the proof of Lemma 8.2 



□ 

Appendix A. Simplifying the model: Deducing Theorem 11.11 from Theorem 12.11 



In this section, we show that the two extra assumptions that |ajj| < log'^ n and has full 
rank with probability one do not violate the generality of Theorem |1.1[ 

To start, we need a very weak lower bound on | det | . 
Lemma A.l. There is a constant C such that 

P(|detA„| < n"^") < n-^. 



Proof. (Proof of Lemma A.l[ ). It follows from [231 Theorem 2.1] that there is a constant C 



such that P{an{An) < n ) < n ^. Since | det A„| is the product of its singular values, the 
bound follows. □ 

Remark A. 2. The above bound is extremally weak. By modifying the proof in [20], one can 
actually prove Tao-Vu lower bound ([T]) for random matrices satisfying CO. Also, sharper 
bounds on the least singular value are obtained in [251 IS]. However, for the arguments in 
this section, we only need the bound on Lemma |A.1[ 

Let us start with the assumption \aij\ < log'^n. We can achieve this assumption using the 
standard truncation method (see [2] or [M])- lu what follows, we sketch the idea. 

Notice that by condition CO, we have, with probability at least 1 — exp(— log"'^'^ n), that 
all entries of An have absolute value at most log^ n, for some constant /3 > which may 
depend on the constants in CO. 

We replace the variable aij by the variable a'^j := .|<iog'3 n.j ^ 1^ i, j < n and let 

A'^ be the random matrix formed by a[y Since with probability at least 1 — exp(— log^*^ n). 
An = A'n, it is easy to show that if A'n satisfies the claim of Theorem 1 1 . 1 1 then so does An- 

While the entries of A'^^ are bounded by log^ n, there is still one problem we need to address, 
namely that the new variables a'- do not have mean and variance one. We can achieve this 
by a simple normalization trick. First observe that by property CO, taking /3 sufficiently 
large, it is easy to show that /nj = Ea^^ has absolute value at most n~'^^^'^ and |1 — (7jj| < 
n~^(^), where aij is the standard deviation of a[j. Now define 



" / _ 
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and 



a. 



Note that a™ now does have mean zero and variance one. Let A'^ and A'^ be the corre- 
sponding matrices of a'-j and a"j respectively. 

By Brun-Minkowski inequahty we have, 

|det«)| < (|det<|^/" + |detiV„|^/"r, 
where Nn is the matrix formed by fiij. 

Since = n^^^^\ by Hadamard's bound | det A'^ril"'^^" ^ n^'^^^). On the other hand, we 



have by Lemma A.l that P(| det^^|^/" > n >1 — n ^. It thus fohows that 



P(|det<| < (l + o(l))|detXl) > 1 
We can prove a matching lower bound by the same argument. From here, we conclude that 



if I det A„ I satisfies the conclusion of Theorem 



1.1 



then so does I det A' I . 



To pass from det(A„) to det(yl„ ), we apply Brunn-Minkowski inequality again, 



|det(<')| < (|det<|^/" + |detiV;|^/")", 

where A''^ is the matrix form by a"j(l — o-ij^)- Note that |1 — cr~j^\ < n~^^^') and \a'-j\ 
log'^(^) n, we infer that | det(^^)| and | det(A^')| are comparable with high probability 



P(|det^^| = (l + o(l))|detX'l) > 1 -n-^ 

Now we address the assumption that An has full rank with probability one. Notice that this 
is usually not true when the Uij have discrete distribution (such as Bernoulli). However, we 
find the following simple trick that makes the assumption valid for our study. 

Instead of the entry Uij, consider a[j := (1 — e^)^/^ajj + e^o where is uniform on the 
interval [—1, 1] and e is very small, say n~^^^^^. It is clear that the matrix A'^ formed by 
the a'j^j has full rank with probability one. On the other hand, it is easy to show that by 
Brunn-Minkowski inequality and Hadamard's bound 



detA„| = (|detA;|i/"±0(n-5°°)r. 
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Furthermore, by Lemma 



A.l 



detj4„| > n with probabihty 1 



n 



and so we can 



conclude as in the previous argument. 
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